AITopics | video clip

Collaborating Authors

video clip

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Two Causally Related Needles in a Video Haystack

Neural Information Processing SystemsJun-14-2026, 07:33:34 GMT

Properly evaluating the ability of Video-Language Models (VLMs) to understand long videos remains a challenge. We propose a long-context video understanding benchmark, Causal2Needles, that assesses two crucial abilities insufficiently addressed by existing benchmarks: (1) extracting information from two separate locations (two needles) in a long video and understanding them jointly, and (2) modeling the world in terms of cause and effect in human behaviors. Causal2Needles evaluates these abilities using noncausal one-needle, causal one-needle, and causal two-needle questions. The most complex question type, causal two-needle questions, require extracting information from both the cause and effect events from a long video and the associated narration text. To prevent textual bias, we introduce two complementary question formats: locating the video clip containing the answer, and verbal description of a visual detail from that video clip. Our experiments reveal that models excelling on existing benchmarks struggle with causal 2-needle questions, and the model performance is negatively correlated with the distance between the two needles.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.78)

Add feedback

185fdf627eaae2abab36205dcd19b817-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-24-2026, 20:48:54 GMT

Appendix The appendix is organized as follows. We also provide details of the annotation/calibration process and the baseline neural networks (NNs) in Section D and E, respectively. We discuss results regarding each weather condition and consideration of the K-Radar dataset as a pre-training dataset for other Radar tensor datasets in Section F and G, respectively. Finally, we introduce details of devkits and list relevant URLs to help with understanding the content of the paper in Section H and I, respectively. A.1 Additional samples of the K-Radar dataset and explanation of LPCs for each weather condition In the sleet (Figure 8-(e)) or heavy snow (Figure 8-(g)) condition, the Lidar point cloud (LPC) measurements of some objects ahead are lost when the ego-vehicle is driving.

artificial intelligence, machine learning, weather condition, (15 more...)

Neural Information Processing Systems

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

Neural Information Processing SystemsFeb-18-2026, 04:02:11 GMT

Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SL T datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SL T.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(22 more...)

Industry:

Education > Curriculum > Subject-Specific Education (0.96)
Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding (Supplementary Materials) Houlun Chen

Neural Information Processing SystemsFeb-12-2026, 07:33:15 GMT

Finally, generate a new brief description mainly based on the originalnull description with attributes information incorporated. Thenull narratives should be similar to the original description.null

artificial intelligence, information, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.40)

Add feedback

Appendix

Neural Information Processing SystemsFeb-10-2026, 21:38:47 GMT

Weevaluated all models onthree additional tasks, beyond those presented inthe main paper. Point-of-no-return (PNR) temporal localization error:Given a video clip of a state change, the networkhastoestimate thetimeatwhich astatechange begins. More specifically,themodel tries toestimate the keyframe within the video clip that contains the point-of-no-return (the time when the state change begins). The occurrence ofstate change isthen predicted bytraining abinary linear classifier, using the concatenated representations as input. ActionRecognition(AR)w/audio:Forthistask,videoembeddings fromfV andaudioembedding from fA are concatenated together and passed through two separate linear classifiers to classify the'verb' and'noun' of the action occurring in the video clip.

artificial intelligence, machine learning, state change, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.57)

Add feedback

185fdf627eaae2abab36205dcd19b817-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-7-2026, 16:45:52 GMT

dataset, video clip, weather condition, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom (0.04)
Europe > Germany (0.04)
Asia > South Korea (0.04)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Communications (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Neural Information Processing SystemsDec-26-2025, 08:28:23 GMT

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior. For each question, EgoSchema requires the correct answer to be selected between five given options based on a three-minute-long video clip. While some prior works have proposed video datasets with long clip lengths, we posit that merely the length of the video clip does not truly capture the temporal difficulty of the video task that is being considered. To remedy this, we introduce temporal certificate sets, a general notion for capturing the intrinsic temporal understanding length associated with a broad range of video understanding tasks & datasets.

diagnostic benchmark, egoschema, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.78)

Add feedback

Dynamic Normalization and Relay for Video Action Recognition

Neural Information Processing SystemsDec-24-2025, 04:18:29 GMT

Convolutional Neural Networks (CNNs) have been the dominant model for video action recognition. Due to the huge memory and compute demand, popular action recognition networks need to be trained with small batch sizes, which makes learning discriminative spatial-temporal representations for videos become a challenging problem. In this paper, we present Dynamic Normalization and Relay (DNR), an improved normalization design, to augment the spatial-temporal representation learning of any deep action recognition model, adapting to small batch size training settings. We observe that state-of-the-art action recognition networks usually apply the same normalization parameters to all video data, and ignore the dependencies of the estimated normalization parameters between neighboring frames (at the same layer) and between neighboring layers (with all frames of a video clip). Inspired by this, DNR introduces two dynamic normalization relay modules to explore the potentials of cross-temporal and cross-layer feature distribution dependencies for estimating accurate layer-wise normalization parameters. These two DNR modules are instantiated as a light-weight recurrent structure conditioned on the current input features, and the normalization parameters estimated from the neighboring frames based features at the same layer or from the whole video clip based features at the preceding layers. We first plug DNR into prevailing 2D CNN backbones and test its performance on public action recognition datasets including Kinetics and Something-Something. Experimental results show that DNR brings large performance improvements to the baselines, achieving over 4.4% absolute margins in top-1 accuracy without training bells and whistles.

dynamic normalization and relay, name change, normalization parameter, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Filters

Collaborating Authors

video clip

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Two Causally Related Needles in a Video Haystack

185fdf627eaae2abab36205dcd19b817-Supplemental-Datasets_and_Benchmarks.pdf

Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding (Supplementary Materials) Houlun Chen

ccdf3864e2fa9089f9eca4fc7a48ea0a-Supplemental.pdf

Appendix

185fdf627eaae2abab36205dcd19b817-Supplemental-Datasets_and_Benchmarks.pdf

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Dynamic Normalization and Relay for Video Action Recognition

477929b8d45ab759795b7aac94329b08-Supplemental-Datasets_and_Benchmarks_Track.pdf